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WILLIAM JOHNSON 


The Liberation of Echo: 

A New Hearing for Film Sound 


Sound has been an integral part of the film 
for well over half a century, yet most critics 
and theorists still pay it little more than lip 
service. Even those who do write perceptively 
about sound treat it as a secondary attribute, 
like color, and continue to equate analysis of 
the image with analysis of the film as a whole.' 

In recent years attempts have been made 
to analyze film sound in greater depth, not- 
ably in special issues of Yale French Studies^ 
and of Screen^ (which I refer to respectively 
as YFS and SST). Yet even here, contributors 
either retain the traditional view of sound 
as appendage to the image or swing to the 
opposite extreme, making sound the puppe- 
teer of the image. One rare exception to the 
general trend is Elisabeth Weis’s study of 
Hitchcock’s use of sound:'' although con- 
cerned with specifics rather than theory, 
Weis consistently attributes equal importance 
to the two channels. 

In this article I hope to show that sound in 
the film is a full and equal partner of the 
image. The first step is to examine some com- 
mon misconceptions about the role of sound 
and indicate why they have remained wide- 
spread for so long. 

The two main aspects of the traditional 
view of film sound are what Rick Altman, in 
the introduction to YFS, refers to as the his- 
torical fallacy and the ontological fallacy. The 
former consists of regarding the image as pri- 
mary because it preceded the advent of the 
sound track. The latter consists of regarding 
the image as the essence of the film and sound 
as a pollutant. 

As many writers have recognized, the his- 
torical fallacy ignores the prevalence of sound 
during the so-called silent era. In addition to 
the standard accompaniment of live music, 
there were experiments with adding sound 
effects and even live dialogue.^ Moreover, the 


prehistory of the cinema had included attempts 
to marry the recording of images with the 
recording of sounds: as Balasz points out,^ if 
those attempts had been successful they would 
hardly have been rejected in favor of the image 
alone. After all, the forms of entertainment 
that influenced the nascent cinema — theater, 
vaudeville and music hall — all relied heavily 
on speech, music and sound effects. 

It is a widely held assumption that film- 
makers took a long time to realize the poten- 
tial of the sound track. Thus Mitry^ states: 
“Except for rare masterpieces . . . , for sev- 
eral years [after the advent of recorded sound] 
there were only filmed plays and musicals.’’ 
Yet the transition period was in fact remark- 
ably brief. In 1929, the first year in which 
Hollywood was fully committed to the sound 
track, film-makers explored many ways of 
combining sound and image. They freely used 
speech and sound effects in outdoor scenes, 
even in such an unlikely context as Millard 
Webb’s Glorifying the American Girl (whose 
lengthy company picnic sequence challenges 
just about every stereotype of the early sound 
film). Speech and sound effects were also 
freely used with frequent scene changes, cam- 
era movement or unusual camera setups (as in 
Capra’s Flight, where a dialogue in a hangar 
is filmed in high-angle long shot). There were 
also many flirtations with experimental uses 
of sound. Examples include: synchronizing 
a source with a “wrong’’ sound, like the pizzi- 
cato playing of fingers on lips in Elsie Janis’s 
Paramount on Parade; extradiegetic address 
of speech to the camera, as in Lubitsch’s The 
Love Parade; expressionistic variation of 
dynamics, as in the well-known “knife’’ scene 
in Hitchcock’s Blackmail; enjambement of 
sound belonging diegetically to one scene 
through part of the following scene, as in 
Sternberg’s Thunderbolt. By 1931, virtually 
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all the uses of sound that can be found in 
narrative films today had already been pio- 
neered. In short, most film-makers adapted 
rapidly to the sound track— in my view, be- 
cause it was not an accretion but a completion. 

The ontological fallacy is no longer pro- 
claimed so passionately as in the late twenties 
and early thirties, but it survives in two wide- 
spread assumptions: that sound in general 
plays only a marginal role in film and that 
synchronized speech in particular is frequently 
redundant. Since speech comes under fire for 
other reasons, I deal with it later. 

On the question of marginality, a typical 
assertion is that of Sparshott:* “It is the re- 
quirements of the visual image that call for the 
elaboration of equipment and the circum- 
stances of display that are fundamental to 
cinema. Film sound has no distinctive quali- 
ties in itself, and can be meaningfully dis- 
cussed only as an adjunct to the visual.” Even 
critics who show a greater interest in sound 
allow it hardly any wider scope than does 
Sparshott. Thus Raymond Bellour, in his 
analysis of Gigi,^ automatically rejects any 
possibility that the structure of this musical 
might depend significantly on the sound track: 
“The image is sovereign on the level of syn- 
tagmatic demarcation. ...” Yet in Demy’s 
The Umbrellas of Cherbourg, to take a clear- 
cut example, the duration of the whole film 
and of many individual scenes, the tempo of 
many actions and gestures, and the season and 
time of day were all determined to a large 
extent by the prerecorded sound track. In 
any instance where the sound may have a pre- 
determined duration or structure the image 
may be viewed as the adjunct. It is the onto- 
logical fallacy that leads critics to analyze 
films as if the image were determined first, 
independently of the sound. 

The same assumption also underlies — and 
undermines — several of the contributions to 
YFS. Claudia Gorbman refers persistently 
to “film/music relationships” as if the music 
were not part of the film. Daniel Percheron 
declares'^ that “the opposition, sound ‘on’/ 
sound ‘off’ . . . depends on the image, and 
consequently testifies to the image’s primacy.” 


As we go to press, Columbia University Press has just released 
an anthology of articles, Film Sound: Theory and Practice. Wil- 
liam Johnson will review this volume for us in our following issue. 


Yet this primacy exists only in the nomen- 
clature, not in the film: it is just as accurate to 
consider the image “on” or “off” the sound 
as the reverse. Metz, who recognizes this par- 
ticular bias, unintentionally promotes another 
as he examines'^ what appears to be our West- 
ern cultural valuation of “visual objects” 
over aural objects: “From a logical point of 
view, ‘buzzing’ is an object, an acoustic object 
in the same way that a tulip is a visual object. 
... As soon as it becomes a question of nam- 
ing the concept of aural object itself, it is 
necessary to add to the word ‘object’ the 
epithet ‘aural,’ . . . while no precision is 
required for that which should logically be 
called ‘visual object’: we consider it self-evi- 
dent that a banner is an object (with no adjec- 
tive needed) but we hesitate over a hoot; it’s 
an infra-object, an object that is only aural.” 
What Metz fails to realize is that “tulip” and 
“banner” refer not to visual objects but to 
objects that may be perceived by various 
senses. Thus the visual equivalents of “buzz- 
ing” and “hoot” would include not “tulip” 
or “banner” but “flash,” “blur,” and “red.” 

Altman sets up another specious antithesis'^ 
by stating that image-without-sound and sound- 
without-image are not complementary and 
symmetrical because “the former is a per- 
fectly common situation in nature (a person 
standing quietly) while the latter is an impos- 
sibility (sounds are always produced by some- 
thing imageable). Thus the completion of the 
former paradigm depends on the object within 
the image, while the completion of the latter 
depends on the auditor (who must look around 
and find the source of the sound). Images call 
for no action on the part of the auditor.” Yet 
many sounds are not imageable in any prac- 
tical sense, beginning with the “room presence” 
that stands for silence and continuing through 
sounds with vast or nebulous sources, such as 
wind or thunder, or hidden sources, as in most 
powered vehicles, phones, radios, etc. More- 
over, sound- without-(visible)-image has its 
counterpart in images where the source is visi- 
ble but the sound is suppressed, as when 
characters speak at a distance from the camera 
or behind glass; and then the image does call 
for action on the part of the auditor, as he or 
she tries to supply the missing sound. By 
contrast, there can be scenes in which the 
auditor is under no compulsion to find the 
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source of a sound “off.” A classic example 
is the ending of Robson’s The Seventh Victim 
when the sound of an overturned chair behind 
a closed door gives the auditor all the neces- 
sary information — that a character has hanged 
herself. 

If sound does have equal ontological status 
with the image, it may be wondered why the 
assumption of image primacy is so widespread. 
Sparshott alludes to one reason in the state- 
ment cited earlier: “the circumstances of 
display that are fundamental to the cinema.” 
In other words, the layout of the movie thea- 
ter, especially the seating, is determined by 
the image. But this would imply primacy only 
if it conflicted with aural needs, which it does 
not, since the nature of sound propagation 
and perception imposes few constraints on the 
seating. In any case, the typical movie theater 
follows the same basic seating pattern as the 
legitimate theater, the music hall and the con- 
cert hall, in all of which sound is paramount. 

A second reason for the assumption of 
image primacy is that full acceptance of sound 
would seem to validate the occasional attempts 
of other sensory channels (smell and touch) 
to enter the film process. Although such 
attempts have led nowhere, couldn’t techno- 
logical advances one day bring us smell-on- 
film and touch-on-film? Don’t arguments 
against these other senses echo the pronounce- 
ments of many twenties critics that sound 
would be a passing fad? 

There are nevertheless good reasons for 
believing that sound and image belong toge- 
ther in the film while the other senses will 
never play more than a novelty role. Taste and 
the various senses of touch are localized in or 
on the body, so that in any realistic film the 
“taster /feeler” would have to be identified 
with one character at a time (a problem that 
Aldous Huxley glosses over in describing the 
feelies of Brave New World). Smell is the 
only other sense that can consistently receive 
information from a distance, like vision and 
hearing, but its chemical means of transmis- 
sion makes it slow to take effect and cumber- 
some to control. (Direct neuronal stimulation 
would bypass these difficulties only to raise 
greater ones.) Psychophysically, vision and 
hearing carry far more information than the 
other senses: the channel capacity is 40 bits 


per second for the eyes and 30 bits per second 
for the ears; third place goes to the skin with 
only 5 bits per second, while taste and smell 
are estimated at 1 bit per second each. Image 
and sound can be made to carry information 
rapidly and precisely to viewers /auditors who, 
without special training, can instantly identify 
a particular face or voice — while most of us 
cannot distinguish a tulip from a rose with 
our nostrils alone or a dime from a penny 
with our fingers. 

Valuation of the image over the sound track 
was reinforced by the circumstances of the 
latter’s birth. Whereas the image had begun 
as a single strip of film, and could therefore 
be thought of as an entity even after it became 
a complex tissue of shots, the sound track 
started out as an obvious assemblage of diverse 
sources. The production of film music already 
existed as a separate enterprise; speech became 
a prime concern of actors and director; and 
sound effects fell by default to other special- 
ists. So today, while analysts do not question 
the unitary status of the image (the Grande 
Syntagmatique, for example, was designed to 
cover the entire image strip, no matter what 
variations there might be in content, camera 
position, lighting or montage), they invariably 
attack the sound track by dividing it into three 
virtually autonomous parts — speech, music 
and sound effects. Thus Stephenson and 
Debrix'^ can assert, “There are not different 
kinds of image as there are different kinds of 
sound — music, speech and noises,” forgetting 
that the image strip may include stock shots, 
second unit footage, model shots and other 
special effects, and additional shooting after 
the first cut. The completed sound track is no 
more heterogeneous than the image strip. 

Even though speech, music and sound 
effects may be produced separately, they 
clearly have no functional autonomy. Thus 
speech can function like sound effects, stress- 
ing its concrete origin — a distinctive timber 
of the voice, or a modifying channel such as 
a phone, a radio or (as in the Gregoretti epi- 
sode of RoGoPaG) a buzzing throat mike. 
Speech can also function like music, stressing 
a rhythmic or intonational pattern, as in the 
chanting of a crowd or the newsreel narration 
in Citizen Kane, not to mention such experi- 
mental uses of rhythmic speech as in Paul 
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Sharits’s T,0,U,C,HJ,N,G. Sound effects 
can function like music, as in the rhythmic 
hamburger-slapping of the group in Sayles’s 
Return of the Secaucus Seven, or most obvi- 
ously in the opening of Mamoulian’s Love 
Me Tonight with its cumulative symphony of 
hammer blows, pickax crashes, and snores. 
Sounds can also convey information in place 
of language, as with coded knockings and 
horn blowings or with the cultural associa- 
tions of sirens, whistling, finger tapping, and 
the like. Music can function like sound effects, 
either mimetically as in Gremillon’s Remorques, 
where choral music is blended with storm 
noises, or as a diegetic background (the appar- 
ent product of a radio, stereo, street band, 
etc.); and it can be made to carry coded infor- 
mation, not only through such familiar 
“words” as the Wedding March and the Mar- 
seillaise but also by more individual means, 
such as the party music that continues faintly 
through a shot of lonely Sylvia Sidney in 
Sitxnhtxg's An American Tragedy. 

In spite of this continual overlap of func- 
tions, many writers still condemn speech as 
inherently redundant to the image while accept- 
ing music and sound effects. Yet just how 
much is redundant when we see and hear a 
character speaking? Even if we can lipread 
and the character remains in full face, we 
cannot tell voice quality or intonation without 
hearing the spoken words. Besides, synchro- 
nized speech serves other purposes than the 
conveying of immediate information. In 
Hitchcock’s Vertigo, for example, the deliber- 
ate dialogue scenes between James Stewart 
and Barbara Bel Geddes throw Stewart’s 
growing obsession with Kim Novak into sharp 
relief. 

In recent years, Marxist and Marxist-in- 
fluenced analysts have reinforced traditional 
prejudices against film speech by focusing on 
what they consider its ideological role in 
capitalist societies, especially the US. In this 
view, redundancy of sound and image is essen- 
tial to bourgeois films because it creates an 
illusion of unity that can assure the viewer/ 
auditor of his/her own unity within the domi- 
nant ideology. Thus Stephen Heath asserts: 
“The stress is everywhere [in the classical 
cinema] on the unity of sound and image and 
the voice is the point of that unity: at once 


subservient to the images and entirely domi- 
nant in the dramatic space it opens in them. 
...” Mary Ann Doane goes so far as to say’^ 
that dialogue is privileged on the sound track 
in order to “preserve the status of speech as 
a property right. ...” 

However, the ideological case against syn- 
chronous sound rests on shaky evidence. 
Marxist critics invariably compare the use of 
sound in the commercial cinema with that in 
the films of Godard, Straub-Huillet, Oshima 
and others who work in critical opposition 
to, though still within, capitalist bourgeois so- 
cieties. Yet the great majority of films made 
in other parts of the world display the same 
traits (including synchronized, privileged 
speech) that allegedly stigmatize the bour- 
geois cinema. This is true even of a sophisti- 
cated, consciously Marxist film such as Gu- 
tierrez Alea’s Up to a Certain Point. 

In any case, the prevalence of synchronized 
speech in the commercial film has been exag- 
gerated. In most films, as speaking characters 
move, turn their heads, are cut “off,” etc., 
synchronization can be identified only inter- 
mittently; when sync speech is sustained, it 
usually challenges the realistic conventions, as 
in two examples in which Spencer Tracy ad- 
dresses the camera directly: Cukor’s Edward, 
My Son and Minnelli’s Father of the Bride. 
Not surprisingly, the most extensive use of 
sync speech is found in “art” films, such as 
Malle’s My Dinner with Andre and Straub- 
Huillet’s Class Relations, and above all in 
experimental films — Hodgdon’s The Whole 
Film, A Filmic Relationship and A Prepared 
Text, Greenaway’s Act of God, Landow’s 
Wide Angle Saxon, and much of Snow’s 
rambling exploration of image -sound rela- 
tions, Rameau *s Nephew. 

Tom Levin is quite correct when he refers 
to the “specificity of the acoustic and the 
visual domains” and states that “the closer 
word and image are coupled the greater the 
contrast between them becomes manifest.” 
But his conclusion — that this contrast poses 
a threat to the realistic narrative cinema be- 
cause it is “an accurate symptom of a culture 
whose members are alienated from them- 
selves” — would, if true, apply to every film- 
producing society in the world. In fact, the 
contrast between sound and image has been 
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a continuing source of strength in all kinds 
of films. Sound and image have remained 
independent not only covertly, by virtue of 
their separate recording processes, but also 
to a large extent overtly, in their relations 
within the finished film. The rest of this article 
examines the scope of those relations. 

Sound and Image Compared 

The crucial condition of sound and image 
in film is that they are both distinct and com- 
parable. 

Points of distinction. Because the image is 
usually taken as the norm from which sound 
deviates, differences between the two channels 
tend to be exaggerated. To deal briefly with 
some common assumptions: (1) The image is 
sharply bounded while sound is not. But the 
microphone records a limited field of sound 
much as the lens records a visual field, even 
though the two rarely coincide. In any case, 
reflections, shadows and halation can chal- 
lenge the sharp boundary of the screen image. 

(2) The image is spatial while sound is tem- 
poral. But the image can assume a temporal 
quality when it is systematically displaced, 
either by rapid cutting or by camera /lens 
movement; and sound can indicate spatial 
shifts when it changes with or within scenes. 

(3) The image is susceptible to frequent change 
by cutting while sound is not (in a realistic 
context, at least). But this assumes that cut- 
ting is the most important means of film seg- 
mentation, and even for the image it is hardly 
self-evident that a major change during a con- 
tinuous scene — an explosion of movement, 
the sudden entry of a character, the switching 
on or off of lights — marks less of a break than 
a cut. Fred Mogubgub’s Enter Hamlet (in 
which a different picture accompanies each 
word of the “To be or not to be” soliloquy) 
serves as a reminder that a continuous but 
articulated sound may convey the same effect 
of segmentation as a strip of cut images. 

The one important distinction between 
sound and image stems from their physical 
origin. The majority of objects that are repre- 
sented in the photographic image do not gen- 
erate light but reflect it (and on the screen, 
the entire image consists of reflected light). 
By contrast, virtually all objects that emit 
sounds do so actively, through some kind of 


movement, and this remains true of the thea- 
ter loudspeakers that reproduce those sounds. 
Thus sound tends to be a series of active events 
that is experienced while the image tends to be 
a static display that is read. This distinction 
enables the two channels to complement each 
other, carrying large amounts of information 
without mutual interference.^® 

Points of comparison. These embrace both 
primary codes — formal or cinematic — and 
secondary codes — cultural or extracinematic. 

At the formal end of the range, sound and 
image may share the extreme similarity of 
being either present or absent. The absence 
is rarely absolute, however: for the image, 
it consists of the perceptible grayness of the 
“black” or imageless screen, and for the 
sound, of a faint background hum. Thus in 
each channel the presence of absence consti- 
tutes a ground of comparison with the other. 

Sound and image can also be compared in 
terms of such basic formal parameters as 
duration, intensity, multiplicity and rate of 
change. Some of these are directly compar- 
able, such as the duration of an image (event) 
and of a sound (event), or the multiplicity of 
elements in each. Others involve synaesthetic 
equivalences, as between the intensity of the 
image (brightness) and of the sound (a com- 
bination of pitch and loudness). Even in these 
latter cases, however, the relation is not merely 
subjective but derives from physically meas- 
urable conditions. 

As Alan Williams notes briefly in YFS,^' 
both sound and image are subject to virtually 
the same set of manipulations. Just as the 
camera can be placed at varying distances 
from the subject, or moved bodily in the pro- 
cess of recording the image, so can the micro- 
phone; and the use of unidirectional and 
omnidirectional mikes corresponds to the use 
of long- and short-focus lenses. Punctuation 
such as the cut, fade and dissolve is as familiar 
on the sound track as in the image strip. All 
of the most widely used manipulations — both 
modal and segmental — are directly compara- 
ble between the two channels. 

At the cultural end of the range of simi- 
larities, both sound and image may be derived 
from profilmic or synthetic material. In the 
former case, the material used in either chan- 
nel may be staged or unstaged, diegetic or 
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extradiegetic, subjective or objective in point 
of view /hearing. 

With all of the foregoing parameters sound 
and image may vary independently, even in 
realistic narrative films: thus at any given 
moment the two channels will be perceptibly 
similar or dissimilar in at least one of many 
ways. For example, sound may be present 
while the image is absent (the screen is black) 
as at the end of LeRoy’s I Am a Fugitive from 
a Chain Gang, when Paul Muni fades into the 
darkness with the words “I steal”; the image 
may be present while sound is absent (the 
sound track is silent), as in Joseph Lewis’s 
The Big Combo when Brian Donlevy’s hear- 
ing aid is pulled out (a scene that also illus- 
trates the combination of objective image and 
subjective sound); and of course, as more 
usually occurs, sound and image may be co- 
present. 

Special relationships. Discussion of sound 
in films often concentrates on two special rela- 
tionships between sound and image: the tem- 
poral link of synchronization and the spatial 
link of so-called “sound on” and “sound 
off,” for which I prefer the unbiased terms 
conjunction and disjunction. However, these 
two relationships are ancillary to the points 
of comparison outlined above. 

Although synchronized speech is usually 
treated as a separate category, the two special 
relationships obtain with all kinds of sound. 
Their main effect is to qualify the basic simi- 
larities and dissimilarities between sound and 
image. The weakest case is conjunction, which 
may be little more than the neutral bracketing 
of the two channels. The other three cases — 
disjunction, synchronization, and what may 
be called anti-synchronicity (the deliberate 
avoidance or disruption of sync) — can all add 
emphasis to either a similarity or a dissimi- 
larity. The strength of the emphasis v^ll depend 
on the context, varying inversely with the fre- 
quency or expectedness of the case. In The 
Love Parade, for example, the cutting of 
scenes in time with rhythmic noises is an infre- 
quent use of sync that strongly emphasizes the 
temporal similarity of sound and image. 

Sound and Image Together 

Two important questions remain: what do 
the sound-image relations signify, and how 


does meaning emerge from their abundance? 

The answers that follow are tentative, being 
based on a rudimentary structural system of 
interactions between the two channels. As 
simple as this system is, however, it does help 
to resolve many problems arising from the 
assumption of one dominant channel in the 
film. 

Two caveats. First, the system is not meant 
to replace the perception of meaning in the 
individual channels: instead, it qualifies and 
supplements that meaning. Second, in order 
to test the general validity of the system, I 
cite examples ranging from commercial enter- 
tainments to serious political and experimen- 
tal films. The fact that my focus is what these 
films have in common is not meant to deny 
their differences. 

In each of the main group of relations 
(points of comparison) there are at any given 
moment two basic possibilities: sound and 
image may coincide or differ. The former 
case may be termed confirmation; the latter, 
opposition. Here are some examples of the 
basic paradigms: 

Confirmation, In Bergman’s The Seventh 
Seal, Max Von Sydow as the knight relaxes 
with Nils Poppe and Bibi Andersson in an 
impromptu picnic. Von Sydow refers to the 
mood of unexpected serenity by saying quietly, 
“I shall remember this moment. . . .” As he 
raises a bowl of milk to drink, reflections 
from its surface cause his face to literally 
light up. Both the sense and acoustic quality 
of the speech combine with the double reflec- 
tion of the image in a vivid demonstration of 
the mood. 

In Lester’s A Hard Day *s Night, the Beatles 
run down a fire escape to the accompaniment 
of “Can’t Buy Me Love.” The camera, placed 
below, rotates to follow the singers as they 
zigzag from flight to flight; at the same time, 
sunlight burns radiant diamond-shaped pat- 
terns in the openwork of the steps. As the 
song reaches the line “I’ll bu/you a diamond 
ring, my friend,” the conjunction of the 
words “diamond” and “ring” with the dia- 
mond-shaped patterns and the circular move- 
ment of the image creates a sense of euphoric 
fitness. 

At the end of Meszaros’s Diary for My 
Children, set during the post-World War II 
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Stalinist era in Hungary, the young protago- 
nist goes to visit her jailed mentor. They have 
to speak through two separate chain-link 
fences in an enclosed yard. Her last words 
are “You’ve turned gray”; after that the 
image freezes, enabling the viewer to take 
stock of the grayness of the scene, and then 
fades to a uniform gray. Aided by the addi- 
tional segmentation of the image, the remark 
expands from a physical description of a 
man’s hair to a metaphor for a whole society. 

In Ashby’s Being There, Peter Sellers plays 
a gardener who has never known any life 
outside his wealthy employer’s town house. 
Then the employer dies. As Sellers sits on the 
bed gazing at the corpse, there is a repeated 
disjunctive noise — the whining attempt to 
start an automobile engine. Although the 
sound serves a naturalistic purpose, its dis- 
junction also imposes a relation with the silent 
focus of the image. Sellers confronting the 
corpse: subjectively, it suggests the wish that 
he could restart the vital functions of his 
employer; objectively, it foreshadows the 
difficulty he faces in making a fresh start in 
life. 

Opposition. In Citizen Kane, there is a 
strong contrast of intensities when Welles as 
Kane dies: a big close-up of his mouth accom- 
panies his almost whispered “Rosebud” — an 
opposition intensified by the obvious synchro- 
nization. The ending of Szabo’s Mephisto 
reverses the contrast: Klaus Maria Brandauer, 
as the non-Nazi actor who has gone along 
with the Nazis in post- 1933 Germany, is seen 
in long shot as a small spotlighted figure in a 
vast arena, while the deep voice of Rolf 
Hoppe (as a Nazi general whom the actor 
thought he could influence) booms at him 
over loudspeakers. Here, the opposition is 
intensified by its conformity to the physically 
active nature of sound and passive nature of 
the image. In each of these two examples, 
the unusual combination of sound and image 
draws attention to a pivotal development — 
a clue to Kane’s character, the exposure of the 
actor’s delusion. 

A different pair of contrasts is found in 
Lang’s Fury. At the trial of the lynch mob, 
silent newsreel footage is projected as evi- 
dence for the prosecution. The silence height- 
ens the abnormality of the defendants’ ex- 


pressions of hate and glee as they set fire to 
the jail and sabotage attempts to save it. The 
presumed victim, Spencer Tracy, is alive but 
in hiding, bent on revenge. When his brothers 
and fiancee urge him to reveal the truth he 
yells, “I don’t need other people!” and stalks 
out into the night. Then, feeling alone in the 
deserted streets, he enters a bar from which 
music is blaring— only to find it empty except 
for a bartender, who stands for a moment 
like a statue. Here, the combination of loud 
sound and virtually immobile image heightens 
the sense of aloneness. 

Three points emerge from these examples. 
The first point concerns the role played by 
spoken (or sung) language in sound-image 
relations. In the examples from The Seventh 
Seal, Citizen Kane and Mephisto, the relation 
depends partly on the acoustic properties of 
speech: while the sense of what Von Sydow 
and Hoppe say contributes to the relation, 
the words they use could be paraphrased with- 
out causing any significant change. With A 
Hard Day *s Night and Diary for My Children, 
however, the relation hinges on a precise 
choice of words. As far as sound-image rela- 
tions are concerned, there is no essential dif- 
ference between meaningful speech and any 
other kind of sound. 

Second, it is clear that image-based analysis 
would by no means class all of the relations as 
paradigmatic, since they range in duration 
from a brief part of a scene {Kane) to two or 
more fairly lengthy scenes (both Fury exam- 
ples, Mephisto). On the basis of sound-image 
relations, however, even the lengthiest exam- 
ples remain unitary, for one of two different 
reasons. The dominant relation may simply 
be repeated, as in the Fury trial, Mephisto 
and also the single but lengthy scene in Being 
There. Alternatively, the relation may be 
founded on — and measured by — a continuous 
sound passage rather than an image scene, as 
in the Fury bar episode with its continuous 
blare of music. It is important to analyze a 
film on the basis of sound-image relations 
rather than by image segments alone. 

The third point is that some of the fore- 
going examples involve additional sound- 
image relations. Thus A Hard Day*s Night 
contains the more general confirmation of 
vigorous image movement and vigorous music; 
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the quiet background in The Seventh Seal 
matches the near-stillness of the image; and 
in Diary for My Children, the metronomic 
clack of a pacing guard’s boots echoes the 
geometric division of the image space by chain- 
link fences. While each of these supplemen- 
tary relations happens to support the domi- 
nant relation, the reverse is equally possible: 
thus in Mephisto there is an element of con- 
firmation between Hoppe’s loud voice and the 
size of his body in the foreground of the 
image. In view of the multiplicity of sound/ 
image parameters, few synchronic relations 
are likely to be simple. How can such complex 
relations be assessed? 

Provisionally, it seems that there is an addi- 
tive effect (with counter-examples assigned a 
negative value). In other words, supplemen- 
tary relations of the same sign increase the 
strength of the confirmation or opposition, 
while those of opposite sign weaken it. The 
total effect varies with the strength of the indi- 
vidual relations, especially the dominant one. 
In what follows, however, I distinguish only 
two levels of intensity for each basic relation 
—strong and weak. Strong implies that the 
dominant relation is intense and/or is sup- 
ported by supplementary relations of the same 
sign, with few or no counter-examples. Weak 
implies that the dominant relation is nonintense 
and/or is supported by few or no relations of 
the same sign and/or is accompanied by one 
or more significant counter-examples. 

There are also, of course, sound-image 
units in which the sum is close to zero, either 
because there is no marked confirmation/ 
opposition or else because relations of both 
types are numerous. (An example of the 
former might be a long shot of an empty land- 
scape accompanied by fairly bland music; of 
the latter, the segment near the opening of 
Rear Window in which the voice of James 
Stewart’s editor is heard on the phone, together 
with diegetic sounds, while the camera roams 
around the room and the courtyard.) The 
meaning of such units depends largely on the 
context. In fact, even the strong units cannot 
be fully appraised in isolation but must be 
related to their syntagmatic organization. 

Sound-image syntagmas. Since sound-image 
units can vary greatly in length, and since the 
segmentation of sound often does not coincide 


with that of the image, it may seem difficult 
to determine where one unit ends and another 
begins. In the Seventh Seal example, since the 
milk-drinking obviously does not coincide 
exactly with Von Sydow’s words, and since 
he continues speaking for some time, where 
does the cited unit of strong confirmation 
yield to another, weakly confirmative unit? 

As with the summing of individual units, 
precision seems unnecessary. In the foregoing 
example there is a continuing balance of con- 
firmation with no signifcant change in sound 
or image. The relation established by one unit 
remains in force until the advent either of a 
different relation or of the same relation 
expressed with a marked difference. Deter- 
mining just how many basic units there are in 
this higher-level segment is unimportant — at 
least at this stage. 

Although the beginning or end of a tradi- 
tional sequence may coincide with a signifi- 
cant change in sound-image relations, sequences 
based on the image alone can diverge so widely 
from those based on both channels together 
that I will refer to the latter as passages. A 
good example of the divergence between se- 
quences and passages is provided by Conway’s 
Red Headed Woman, a crypto-feminist film 
whose effectiveness is hard to account for in 
image-based terms. The film is a star vehicle 
for Jean Harlow, and since the outstanding 
element in her performance is her voice, it is 
easy to dismiss the direction of the image as 
tributary and inferior. Yet the interaction 
between sound and image organizes the ap- 
parently rambling plot into a strong chain of 
passages. At the beginning of the film, for 
example, Harlow attempts to seduce her boss 
Chester Morris and is frustrated by the return 
of his wife Leila Hyams. Harlow’s uncere- 
monious departure from Morris’s house is 
followed by a close-up of her laughing. The 
combination of large image and loud laughter, 
the strongest confirmation in the film so far, 
proclaims the end of one passage and pre- 
pares the viewer /auditor for another. Later, 
after Morris and Hyams have divorced, 
Hyams visits him in hopes of a reconciliation 
only to find that he has just married Harlow: 
in a tony voice, Hyams delivers an intense 
wronged-woman speech and makes a proud 
exit, at which point Harlow regains control 
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by turning up the volume on her perky whine 
and uttering just three unexpected words: 
“That cheap thing!” Here it is a strong oppo- 
sition that marks the end of the passage: the 
contrast between Harlow’s words and tone of 
voice is reflected and reinforced by her petu- 
lant expression and arms-akimbo pose. 

There can be interactions on a larger scale 
between different sound-image passages. The 
mood of serenity in the Seventh Seal interlude 
derives partly from its contrast with two pre- 
ceding passages — the procession of flagellants 
and the attack on Nils Poppe at the inn. Both 
represent strong confirmation of high inten- 
sity: loud sound and continual image move- 
ment (plus, in the procession, a multiplicity 
of image elements). The euphoric scene in 
A Hard Day's Night stands out by contrast 
with a more static preceding passage, in the 
same way as traditional musicals intensify the 
strong confirmation of song plus dance by 
means of opposition or weak confirmation in 
the surrounding nonmusical passages. 

Above the sound-image passage there does 
not usually seem to be any significant level of 
organization other than the whole film. Locat- 
ing the boundaries of this structural unit 
causes few problems, of course. But the rela- 
tion between the passage and the whole can 
be more elusive. It can be analyzed most easily 
in experimental films that have little or no 
narrative detail. Dwoskin’s Jesus' Blood Never 
Failed Me Yet consists of many repetitions of 
a single image scene (a man walking toward 
the camera) accompanied by a male voice 
singing the evangelical hymn of the title. After 
a while soft chords from a chamber orchestra 
enter behind the voice and become progres- 
sively fuller in sound as they continue from 
repetition to repetition. The passage shift 
from weak confirmation (fairly close image 
of man, fairly loud voice) to weak opposition 
(distant image, fairly loud voice) at the end 
of each repetition fades in importance beside 
the gradual shift in sound-image relations that 
extends through the whole film. Here the 
latter shift is easy to identify because the 
image is repeated. Otherwise, an analysis 
based primarily on the image might not go 
beyond the former. 

Because sound and image share nearly all 
of the same parameters and manipulations 


they can take either side of any particular 
relation and, indeed, frequently exchange 
sides in a continuing or repeated relation. A 
case in point is the use of the voice-over narra- 
tor. Brian Henderson states^'* that in classical 
films such narrators are “jerked on and off 
stage in a manner that is quite undignified” 
and are “ludicrous stand-ins for the novelistic 
T,’ ” yet adds: “This is not to disparage the 
convention of the voice-over in cinema, which 
has figured in so many excellent films.” Ear- 
lier Henderson hints at a resolution of this 
apparent contradiction: “Of course, voice- 
over narration in cinema does not comprise 
the whole text [as character narration can in 
literature]. ... It is just one element among 
many elements, to be juggled along with them, 
often in shifting combinations.” It is pre- 
cisely as one sound element among the shift- 
ing combinations of sound and image that 
voice-over narration may be “jerked on and 
off stage” and yet form an integral part of the 
film. 

Exchanges between the two channels affect 
not just voice-over narrations but all kinds 
of sounds (and images) as well. A sound- 
image relation may persist even though the 
individual parameters of the sound and the 
image change. For example, if lengthy images 
accompanied by brief sounds give way to 
brief images accompanied by lengthy sounds, 
there need be only a superficial change: the 
basic long /short opposition continues. This 
frequently occurs in Ozu’s later films, where 
sequences involving characters consist of 
lengthy image scenes accompanied by rapid 
and/or brief passages of dialogue, while tran- 
sition sequences consist of brief image scenes 
(streets, alleys, buildings, corridors, etc.) 
accompanied by continuous music. In this 
way Ozu achieves variety without disrupting 
the basic rhythm of his films. 

The fact that sound-image syntagmas may 
have meaning over and above that of the indi- 
vidual relations is demonstrated by Naruse’s 
Meshi. The film revolves around Setsuko 
Hara, a housewife who becomes aware of 
boredom after she and her husband, Ken 
Uehara, move from Tokyo to Osaka; she 
returns to Tokyo on her own, intending to 
take a job, but finally decides to stay with her 
husband. The first half of the film is accom- 
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panied by one of Fumio Hayasaka’s typical 
westernized background scores, a vague, sen- 
timental wash of music. Thus there is a con- 
tinual weak opposition that happens to fit 
Hara’s passive discontent. In the second half 
of the film, after Hara decides to leave for 
Tokyo, the music is often absent, and when it 
does occur it is more closely matched to the 
mood — a shift to a continual weak confirma- 
tion that gently underscores Hara’s shift to 
action. Meshi demonstrates that even an insi- 
pid background score cannot be automatically 
condemned, as it usually is, for aesthetic or 
ideological shortcomings. 

One obvious challenge to my proposed 
system is raised by films whose sound-image 
units are in such continual flux that no sus- 
tained or dominant relation emerges at higher 
levels. Yet this flux itself may constitute a 
dominant pattern — a series of contrasts that 
can be used to shock the filmgoer to either 
comic or serious effect. The Marx Brothers 
films offer examples of the former, with the 
brothers themselves displaying a wide range 
of sound-image relations: Groucho uttering 
wisecracks as he moves in his crouching lope, 
or rising to full height to launch a denuncia- 
tion or a song; Chico’s air of innocent rea- 
sonableness as he proposes ridiculous schemes 
in mangled English, or his cheerful piano 
playing with index finger pointed like a gun; 
Harpo’s muteness accompanied with mania- 
cal enthusiasm, lechery, misery or rage, at 
peace only when he plays the harp. 

Both humor and seriousness emerge from 
the flux of such experimental films as Berlin- 
er’s Myth in the Electric Age and Natural 
History, in which a loosely associated series 
of images is paired with independently chosen 
sounds. Some of the pairings are incongruous 
or ironic, such as a sawing noise with the 
image of a woodpecker or piano music syn- 
chronized with a man hammering, while others 
produce a strong confirmation, such as the 
dramatic jet roar that accompanies a time- 
lapse flow of mist down a mountain valley. 

A serious application of sound-image flux 
is found in the early sound films of Vertov, 
Enthusiasm and Three Songs About Lenin, 
{Enthusiasm has the distinction of being the 
first film in which the sound -recordist is actu- 


ally visible, just like Vertov’s Man with the 
Movie Camera.) Both image and sound tend 
to be cut rapidly, though sometimes sounds 
continue through several image scenes and, 
less often, image scenes continue through sev- 
eral sound phrases. Frequent superimposed 
titles add complexity to the image, often 
changing or repeating in a separate rhythm of 
their own. A single held musical tone may 
accompany a crowd of workers {Enthusiasm) 
or the roar of an urban crowd may accompany 
a rural scene {Lenin). Images range from big 
close-ups to extreme long shots, and undergo 
such evident manipulations as freezing, time 
lapse and prismatic replication. The sound 
track includes synchronized speech, voice- 
over, disjunctive noises and various styles of 
music. Except for some anti-religious scenes 
near the beginning of Enthusiasm, the two 
films attempt no polemic — in fact, their mer- 
curial sound-image relations could hardly 
sustain one. What they do offer is an impres- 
sion of vigor and diversity that is meant to 
characterize the Soviet state. 

The conspicuous role that sound plays in 
Vertov— as in such films as the Tavianis’ Padre, 
Padrone and Godard’s First Name: Carmen— 
is only one aspect of liberated Echo. Since my 
thesis is that sound deserves equal billing with 
the image even when its role is inconspicuous, 
I choose a more naturalistic film to make a 
final point. 

Critical response to Wenders’s Paris, Texas 
has differed most sharply over the peepshow 
palace episode, which is seen as either a high 
point or a letdown. Those in favor of the 
episode could invoke sound-image relations 
to support their view: an exchange of roles 
between the two channels enables the film to 
end both differently from and in balance with 
its beginning, since Harry Dean Stanton 
evolves from one sound-image opposition — 
mobile and mute — to its inverse — static and 
voluble. However, dissenting critics also could 
invoke sound-image relations: the peepshow 
episode does not actually end the film, and 
its dialogue, full of tardy exposition, lacks the 
driving intensity of Stanton’s walking. 

In short, the sound-image system I have 
outlined is not meant to provide easy answers. 
In more positive terms, this theoretical struc- 
ture should not constrain discussion about 
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films or foreclose on critical options but, 
rather, open up new perspectives on the film. 
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documentary — the “rockumentary,” if you 
will — developed a routine as predictable as the 
B western. D. A. Pennebaker’s epochal Don't 
Look Back (1967) and Monterey Pop (1968) 
are the obvious prototypes, their makeshift 
mix of concert sequences, cinema verite back- 
stage passages, and color commentary by 
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semi-articulate scene-makers inspiring virtual 
xeroxes from enterprising film-maker /group- 
ies desperate for screen subject matter. Early 
on, the form attained a creative summit with 
the yin /yang dialectics of Woodstock (1970) 
and Gimme Shelter (1971), but from there it 
was mostly downhill. Throughout the next 
decade, bands from Abba to Zappa put their 
acts on celluloid in what had become a cine- 
matic analog to vanity publishing: Alice 
Cooper’s Welcome to My Nightmare, AC/ 
DC’s Let There Be Rock, Blue Oyster Cult 
and Black Sabbath’s Black and Blue, or Paul 
McCartney’s Rockshow, Well before the Roll- 
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